GnosisMiner: Reading Order Recommendations over Document Collections
نویسندگان
چکیده
Given a document collection, existing systems allow users to locate documents either using search keywords or by navigating through some predefined organization of the collection. Other approaches help the user understand a collection by generating summaries or clusters of the documents at hand. However, often users would like to understand how the documents may be related to each other and access them in some logical order. In this work, we present an interactive reading recommendation system, called GnosisMiner. Given a collection of documents and a theme, the system returns a partial order of documents relevant to that theme organized from more general to more specific. The recommended reading order resembles the human approach of learning as we typically start our path to knowledge from more general documents that help us understand the domain and then we proceed with more specific, more specialized documents to increase our knowledge of the matter.
منابع مشابه
Summarization of Changes in Dynamic Text Collections
Information Retrieval is the Informatics field primarily focused on all problems and challenges related to information storage and access. The large majority of works in this area are based on static collections of documents. However, many of these collections are dynamic, and have evolved over time with documents being added, edited or simply removed at different times. Even in highly dynamic ...
متن کاملCreating synthetic temporal document collections
In research in temporal document databases, large temporal document collections are necessary in order to be able to compare and evaluate new strategies and algorithms. Large temporal document collections are not easily available, and an alternative is to create synthetic document collections. In this paper we will describe how to generate synthetic temporal document collections, how this is re...
متن کاملDocument understanding for a broad class of documents
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these information sources, we define a document representation general and flexible enough to represent comp...
متن کاملLeipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data Analysis
This paper presents the “Leipzig Corpus Miner”—a technical infrastructure for supporting qualitative and quantitative content analysis. The infrastructure aims at the integration of “close reading” procedures on individual documents with procedures of “distant reading”, e.g. lexical characteristics of large document collections. Therefore information retrieval systems, lexicometric statistics a...
متن کاملSampling strategies for information extraction over the deep web
Information extraction systems discover structured information in natural language text. Having information in structured form enables much richer querying and data mining than possible over the natural language text. However, information extraction is a computationally expensive task, and hence improving the efficiency of the extraction process over large text collections is of critical intere...
متن کامل